Prompt:
In the file am_jobdata.csv you will find information for the first job outcomes of college graduates from India including their salaries, test scores, their high school grades and other data. We'll try to make some interesting graphs in order to :
In any data analysis exercise, visualization is a good first step that can confirm assumptions or shine light on some promising directions for analysis.
See the Seaborn Gallery for ideas and directions on types of graphs that can be plotted.
Hints:
To get started, you may load the .csv file into Excel, Google Sheets or some other spreadsheet program to see what the data is and what its range is.
We'll be using seaborn to help us plot the data. Seaborn is a library that helps make visually appealing graphs and build on the matplotlib library. Head to the Seaborn tutorials page for some help on how to get started with plotting graphs in seaborn.
Think carefully about what type of graphs you want to plot - histograms, scatterplots, box-plots, violin plots, grouped data plots can all be useful.
In [113]:
# import libraries
import matplotlib
import IPython
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import pylab
import seaborn as sns
%matplotlib inline
In [114]:
# load the data
# hint : look up how to read .csv files in pandas
job_data = """ fill in something here!"""
In [27]:
# Let's see what the salaries look like!
mx = max(job_data.salary)
mn = min(job_data.salary)
print "Max is : "+str(mx)
print "Min is : "+str(mn)
In [129]:
sb.distplot(job_data.salary.dropna(),bins = 100,rug=False)
sns.kdeplot(job_data.salary.dropna(),shade=True,color='blue')
# sns.rugplot(job_data.salary.dropna(),color='pink')
Out[129]:
In [128]:
# Lets plot males (gender == 1) vs females!
# Create data categories
m = job_data['gender']==1
f = job_data['gender']==0
sns.kdeplot(job_data[m].salary.dropna(),color='b')
sns.kdeplot(job_data[f].salary.dropna(),color='pink')
Out[128]: